Word Sense Induction using Cluster Ensemble
نویسندگان
چکیده
In this paper, we describe the implementation of an unsupervised learning method for Chinese word sense induction in CIPS-SIGHAN-2010 bakeoff. We present three individual clustering algorithms and the ensemble of them, and discuss in particular different approaches to represent text and select features. Our main system based on cluster ensemble achieves 79.33% in F-score, the best result of this WSI task. Our experiments also demonstrate the versatility and effectiveness of the proposed model on data sparseness problems.
منابع مشابه
KSU KDD: Word Sense Induction by Clustering in Topic Space
We describe our language-independent unsupervised word sense induction system. This system only uses topic features to cluster different word senses in their global context topic space. Using unlabeled data, this system trains a latent Dirichlet allocation (LDA) topic model then uses it to infer the topics distribution of the test instances. By clustering these topics distributions in their top...
متن کاملDuluth : Word Sense Induction Applied to Web Page Clustering
The Duluth systems that participated in task 11 of SemEval–2013 carried out word sense induction (WSI) in order to cluster Web search results. They relied on an approach that represented Web snippets using second–order co– occurrences. These systems were all implemented using SenseClusters, a freely available open source software package.
متن کاملSemi-supervised Learning by Fuzzy Clustering and Ensemble Learning
This paper proposes a semi-supervised learning method using Fuzzy clustering to solve word sense disambiguation problems. Furthermore, we reduce side effects of semi-supervised learning by ensemble learning. We set classes for labeled instances. The -th labeled instance is used as the prototype of the -th class. By using Fuzzy clustering for unlabeled instances, prototypes are moved to more sui...
متن کاملApplying Spectral Clustering for Chinese Word Sense Induction
Sense Induction is the process of identifying the word sense given its context, often treated as a clustering task. This paper explores the use of spectral cluster method which incorporates word features and ngram features to determine which cluster the word belongs to, each cluster represents one sense in the given document set.
متن کاملChinese Word Sense Induction based on Hierarchical Clustering Algorithm
Sense induction seeks to automatically identify word senses of polysemous words encountered in a corpus. Unsupervised word sense induction can be viewed as a clustering problem. In this paper, we used the Hierarchical Clustering Algorithm as the classifier for word sense induction. Experiments show the system can achieve 72% F-score about train-corpus and 65% F-score about test-corpus.
متن کامل